Search CORE

10 research outputs found

DataHub: Collaborative Data Science & Dataset Version Management at Scale

Author: Bhardwaj Anant
Bhattacherjee Souvik
Chavan Amit
Deshpande Amol
Elmore Aaron J.
Madden Samuel
Parameswaran Aditya G.
Publication venue
Publication date: 02/09/2014
Field of study

Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ability to perform collaborative data analysis building on this version control system. We outline the challenges in providing dataset version control at scale.Comment: 7 page

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Operationalizing Machine Learning: An Interview Study

Author: Garcia Rolando
Hellerstein Joseph M.
Parameswaran Aditya G.
Shankar Shreya
Publication venue
Publication date: 16/09/2022
Field of study

Organizations rely on machine learning engineers (MLEs) to operationalize ML, i.e., deploy and maintain ML pipelines in production. The process of operationalizing ML, or MLOps, consists of a continual loop of (i) data collection and labeling, (ii) experimentation to improve ML performance, (iii) evaluation throughout a multi-staged deployment process, and (iv) monitoring of performance drops in production. When considered together, these responsibilities seem staggering -- how does anyone do MLOps, what are the unaddressed challenges, and what are the implications for tool builders? We conducted semi-structured ethnographic interviews with 18 MLEs working across many applications, including chatbots, autonomous vehicles, and finance. Our interviews expose three variables that govern success for a production ML deployment: Velocity, Validation, and Versioning. We summarize common practices for successful ML experimentation, deployment, and sustaining production performance. Finally, we discuss interviewees' pain points and anti-patterns, with implications for tool design.Comment: 20 pages, 4 figure

arXiv.org e-Print Archive

Revisiting Prompt Engineering via Declarative Crowdsourcing

Author: Asawa Parth
Jain Naman
Parameswaran Aditya G.
Shankar Shreya
Wang Yujie
Publication venue
Publication date: 07/08/2023
Field of study

Large language models (LLMs) are incredibly powerful at comprehending and generating data in the form of text, but are brittle and error-prone. There has been an advent of toolkits and recipes centered around so-called prompt engineering-the process of asking an LLM to do something via a series of prompts. However, for LLM-powered data processing workflows, in particular, optimizing for quality, while keeping cost bounded, is a tedious, manual process. We put forth a vision for declarative prompt engineering. We view LLMs like crowd workers and leverage ideas from the declarative crowdsourcing literature-including leveraging multiple prompting strategies, ensuring internal consistency, and exploring hybrid-LLM-non-LLM approaches-to make prompt engineering a more principled process. Preliminary case studies on sorting, entity resolution, and imputation demonstrate the promise of our approac

arXiv.org e-Print Archive

Waltzing binaries: Probing line-of-sight acceleration of merging compact objects with gravitational waves

Author: Ajith Parameswaran
Arun K. G.
Kapadia Shasvath J.
Tiwari Avinash
Vijaykumar Aditya
Publication venue
Publication date: 13/07/2023
Field of study

Line-of-sight acceleration of a compact binary coalescence (CBC) event would modulate the shape of the gravitational waves (GWs) it produces with respect to the corresponding non-accelerated CBC. Such modulations could be indicative of its astrophysical environment. We investigate the prospects of detecting this acceleration in future observing runs of the LIGO-Virgo-KAGRA network, as well as in next-generation (XG) detectors and the proposed DECIGO. We place the first observational constraints on this acceleration, for putative binary neutron star mergers GW170817 and GW190425. We find no evidence of line-of-sight acceleration in these events at

90\%

confidence. Prospective constraints for the fifth observing run of the LIGO at A+ sensitivity suggest that accelerations for typical BNSs could be constrained with a precision of

a/c \sim 10^{-7}~[\mathrm{s}^{-1}]

, assuming a signal-to-noise ratio of

10

. These improve to

a/c \sim 10^{-9}~[\mathrm{s}^{-1}]

in XG detectors, and

a/c \sim 10^{-16}~[\mathrm{s}^{-1}]

in DECIGO. We also interpret these constraints in the context of mergers around supermassive black holes.Comment: Accepted to Ap

arXiv.org e-Print Archive

Decibel: the relational dataset branching system

Author: Deshpande Amol
Elmore Aaron J.
Goehring David G.
Madden Samuel R
Maddox Michael A
Parameswaran Aditya
Publication venue: 'VLDB Endowment'
Publication date: 01/05/2016
Field of study

As scientific endeavors and data analysis become increasingly collaborative, there is a need for data management systems that natively support the versioning or branching of datasets to enable concurrent analysis, cleaning, integration, manipulation, or curation of data across teams of individuals. Common practice for sharing and collaborating on datasets involves creating or storing multiple copies of the dataset, one for each stage of analysis, with no provenance information tracking the relationships between these datasets. This results not only in wasted storage, but also makes it challenging to track and integrate modifications made by different users to the same dataset. In this paper, we introduce the Relational Dataset Branching System, Decibel, a new relational storage system with built-in version control designed to address these short-comings. We present our initial design for Decibel and provide a thorough evaluation of three versioned storage engine designs that focus on efficient query processing with minimal storage overhead. We also develop an exhaustive benchmark to enable the rigorous testing of these and future versioned storage engine designs.National Science Foundation (U.S.) (1513972)National Science Foundation (U.S.) (1513407)National Science Foundation (U.S.) (1513443)Intel Science and Technology Center for Big Dat

DSpace@MIT

PubMed Central

eScholarship - University of California

Influence of plasma modification on mechanical and thermal properties of Polypropylene/ Nano-Calcium Silicate Composites

Author: Ajeesh. G
George Philip
Pandiyan Sundhara
Parameswaran Adarsh
Parvathy S.
Raman Aditya
Publication venue: 'EDP Sciences'
Publication date: 01/01/2018
Field of study

The aim of the research is to study the influence of plasma modification on nano calcium silicate/polypropylene composites. Polypropylene (PP) is considered for this study as it possesses high impact strength, toughness and availability. Calcium silicate is considered as reinforcement because of its high temperature resistance, high flexural strength and high strength to mass ratio. Fourier transform infrared spectroscopy (FTIR) results show that there is a change in the functional group on the surface of calcium silicate after modification. Thermo-Gravimetric Analysis (TGA), Differential Scanning Calorimetry (DSC) results show that the decomposition temperature increased with increasing amount of filler particles. It is also observed that the modification has produced a marginal increase in the decomposition and glass transition temperature. Tensile test results also show a gradual increase in the tensile properties of composites when high ratio is reinforcement. Tensile test results also show that there is a marginal increase in the tensile strength when reinforced with modified calcium silicate when compared to non-modified. Scanning Electron Microscopy (SEM) reveals that there is a enhanced dispersion of nano particles on modification. Based on the findings it can be concluded that plasma modification enhances the thermal and mechanical property marginally

Directory of Open Access Journals

Influence of plasma modification on mechanical and thermal properties of Polypropylene/ Nano-Calcium Silicate Composites

Author: Adarsh Parameswaran
Aditya Raman
G Ajeesh.
Philip George
S. Parvathy
Sundhara Pandiyan
Publication venue: 'EDP Sciences'
Publication date: 09/01/2018
Field of study

EDP Sciences OAI-PMH repository (1.2.0)

Collaborative data analytics with DataHub

Author: Bhardwaj Anant P.
Deshpande Amol
Elmore Aaron J.
Karger David R.
Madden Samuel R.
Parameswaran Aditya
Subramanyam Harihar G.
Wu Eugene
Zhang Rebecca
Publication venue: 'VLDB Endowment'
Publication date: 01/08/2015
Field of study

While there have been many solutions proposed for storing and analyzing large volumes of data, all of these solutions have limited support for collaborative data analytics, especially given the many individuals and teams are simultaneously analyzing, modifying and exchanging datasets, employing a number of heterogeneous tools or languages for data analysis, and writing scripts to clean, preprocess, or query data. We demonstrate DataHub, a unified platform with the ability to load, store, query, collaboratively analyze, interactively visualize, interface with external applications, and share datasets. We will demonstrate the following aspects of the DataHub platform: (a) flexible data storage, sharing, and native versioning capabilities: multiple conference attendees can concurrently update the database and browse the different versions and inspect conflicts; (b) an app ecosystem that hosts apps for various data-processing activities: conference attendees will be able to effortlessly ingest, query, and visualize data using our existing apps; (c) thrift-based data serialization permits data analysis in any combination of 20+ languages, with DataHub as the common data store: conference attendees will be able to analyze datasets in R, Python, and Matlab, while the inputs and the results are still stored in DataHub. In particular, conference attendees will be able to use the DataHub notebook---an IPython-based notebook for analyzing data and storing the results of data analysis

CiteSeerX

DSpace@MIT

PubMed Central

eScholarship - University of California

Hetero-bimetallic cooperative catalysis for the synthesis of heteroarenes

Author: Acheson
Aditya G. Lavekar
Alan Jones
Allegretti
Amat
Amat
Ameta
Ameta
Anant R. Kapdi
Anderson
Badaway
Badio
Bagle
Bagley
Balog
Bandaru
Barlin
Batey
Bennasar
Bevan
Bhatt
Bhilare
Bhilare
Boehm
Brennführer
Brigas
Brown
Cao
Cao
Carey
Carey
Carroll
Catalan
Chadha
Chancellor
Chatterjee
Chaudhari
Chauhan
Chen
Chen
Chen
Chen
Chen
Chen
Chen
Chetneni
Chiba
Chintharlapalli
Chiou
Choi
Choi
Chu
Chung
Congreve
Contractor
Cossy
Cui
De Miranda
Dehaen
DeLeon
Demir
Demir
Demir
Demir
Denißen
Dolzhenko
Dorlars
Dudnik
Egi
Eicher
El-Azab
Ellis
Fan
Farber
Finnegan
Fischer
Fletcher
Ford
Fresneau
Fürstner
Galenko
Galenko
Gao
Gao
Gao
Gao
Garbrecht
Gaurav R. Gupta
Giri
Gorla
Granchi
Granchi
Granger
Granger
Grazulevicius
Gribble
Gribble
Grierson
Guan
Guin
Gundla
Gupta
Gupta
Gupta
Habich
Hao
Hashmi
Hayashi
He
Heeres
Henry
Herr
Houlihan
Hranjec
Hu
Huang
Huang
Huo
Ibrahim
Imran
Imran
Ishikawa
Ismail
Jacobi
Jagrut Shah
Jia
Jiang
Jiang
Joo
Joo
Joo
Joule
Joule
Kalck
Kamijo
Kamlesh S. Vadagaonkar
Katritzky
Kawasaki
Kawasaki
Kawasaki
Kayet
Khan
Kim
Kotovskaya
Kouznetsov
Kraft
Kumar
Kumar
Kurzer
Lainton
Lamberth
Lamberth
Lang
Le
Lee
Leon
Li
Li
Li
Li
Li
Li
Li
Liao
Lindel
Lindsley
Liu
Liu
Liu
Liu
Lohr
Lorion
Luo
Luo
Lutz
Ma
Ma
Ma
Ma
Magnus
Mai
Manisha
Martin
Martin
McDougal
McGuigan
Meldal
Metz
Michael
Miller
Moses
Motoba
Mousseron-Canet
Mukherjee
Myers
Nasrollahzadeh
Ndakala
Neumann
Nicolaou
Novak
Obeid
Oderinde
Oldfield
Omar
Oparin
Orru
Orth
Ostrovskii
Pa
Pandey
Parameswaran
Park
Pathare
Peng
Peng
Percival
Perez
Peters
Pierre
Pisano
Portevin
Prier
Pye
Rani
Rao
Reddy
Regueiro-Ren
Robertson
Robins
Robins
Rodgers
Rogge
Roig
Roy
Royer
Ruiz-Sanchis
Safe
Sainsbury
Saito
Sakemi
Sathishkumar
Sato
Sawant
Schmidt
Schumacher
Seela
Shaflee
Shao
Shi
Shibasaki
Shobeiri
Sinfelt
Singer
Sivaprasad
Skladanowski
Somei
Somei
Somei
Somei
Song
Song
Song
Song
Song
Spande
Stauffer
Sumpter
Sun
Svetlik
Takase
Tao
Too
Trofimov
Truica-Marasescu
Tumir
Van der Eycken
van Muijlwijk-Koezen
Walton
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wasilke
Wei
Wilson
Wolfe
Wong
Wu
Wu
Wu
Wu
Xia
Xiao
Xiao
Xie
Yamamoto
Yan
Yan
Yang
Yao
Yao
Ye
Yella
Yoshida
Yu
Yu
Yu
Zabrocki
Zabrocki
Zammit
Zhang
Zhang
Zhang
Zhang
Zhao
Zhao
Zhong
Zhou
Zhou
Zhu
Zhu
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 01/01/2019
Field of study

Crossref